skip to main content


Search for: All records

Creators/Authors contains: "Beck, David"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Stability of proteins at high temperature has been a topic of interest for many years, as this attribute is favourable for applications ranging from therapeutics to industrial chemical manufacturing. Our current understanding and methods for designing high-temperature stability into target proteins are inadequate. To drive innovation in this space, we have curated a large dataset, learn2thermDB, of protein-temperature examples, totalling 24 million instances, and paired proteins across temperatures based on homology, yielding 69 million protein pairs - orders of magnitude larger than the current largest. This important step of pairing allows for study of high-temperature stability in a sequence-dependent manner in the big data era. The data pipeline is parameterized and open, allowing it to be tuned by downstream users. We further show that the data contains signal for deep learning. This data offers a new doorway towards thermal stability design models.

     
    more » « less
  2. Data science and machine learning are revolutionizing enzyme engineering; however, high-throughput simulations for screening large libraries of enzyme variants remain a challenge. Here, we present a novel but highly simple approach to comparing enzyme variants with fully atomistic classical molecular dynamics (MD) simulations on a tractable timescale. Our method greatly simplifies the problem by restricting sampling only to the reaction transition state, and we show that the resulting measurements of transition-state stability are well correlated with experimental activity measurements across two highly distinct enzymes, even for mutations with effects too small to resolve with free energy methods. This method will enable atomistic simulations to achieve sampling coverage for enzyme variant prescreening and machine learning model training on a scale that was previously not possible. 
    more » « less
  3. Deep eutectic solvents (DESs) are an attractive class of materials with low toxicity, broad commercial availability, low costs and simple synthesis, which allows for tuning of their properties. We develop and demonstrate the use of high-throughput and data-driven strategies to accelerate the investigation of new DES formulations. A cheminformatics approach is used to outline a design space, which results in 3477 hydrogen bond donor (HBD) and 185 quaternary ammonium salt (QAS) molecules identified as good candidate components for DES. The synthesis methodology is then adapted to a high-throughput protocol using liquid handling robots for the rapid synthesis of DES combinations. High-throughput electrochemical characterization and melting point detection systems are used to measure key performance metrics. To demonstrate the new workflow, a total of 600 unique samples are prepared and characterized, corresponding to 50 unique DES combinations at 12 HBD/QAS molar ratios. After synthesis, a total of 230 samples are found liquid at room temperature and further characterized. Several DESs display conductivities above 1 mS cm −1 , with a maximum recorded conductivity of 13.7 mS cm −1 for the combination of acetylcholine chloride (20 mol%) and ethylene glycol. All liquid DES samples show stable potential windows greater than 3 V. We also demonstrate that these DESs are electrochemically limited by viscosity, both in the conductivity and in the limiting processes on their cyclic voltammograms. Comparison with literature reports shows good agreement for properties measured in the high-throughput study, which helps to validate the workflow. This work demonstrates new methods to accelerate the collection of key DES metrics, providing data to formulate robust property prediction models and obtaining insight on interactions between molecular components. Data-driven high-throughput experimentation strategies can accelerate DES development for a variety of applications. Moreover, these approaches can also be extended to tackle other materials challenges with large molecular design spaces. 
    more » « less
  4. null (Ed.)
    Attention mechanisms have led to many breakthroughs in sequential data modeling but have yet to be incorporated into any generative algorithms for molecular design. Here we explore the impact of adding self-attention layers to generative β -VAE models and show that those with attention are able to learn a complex “molecular grammar” while improving performance on downstream tasks such as accurately sampling from the latent space (“model memory”) or exploring novel chemistries not present in the training data. There is a notable relationship between a model's architecture, the structure of its latent memory and its performance during inference. We demonstrate that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff. We anticipate that attention will play an important role in future molecular design algorithms that can make efficient use of the detailed molecular substructures learned by the transformer. 
    more » « less
  5. null (Ed.)
    Chemical engineering is being rapidly transformed by the tools of data science. On the horizon, artificial intelligence (AI) applications will impact a huge swath of our work, ranging from the discovery and design of new molecules to operations and manufacturing and many areas in between. Early adoption of data science, machine learning, and early examples of AI in chemical engineering has been rich with examples of molecular data science—the application tools for molecular discovery and property optimization at the atomic scale. We summarize key advances in this nascent subfield while introducing molecular data science for a broad chemical engineering readership. We introduce the field through the concept of a molecular data science life cycle and discuss relevant aspects of five distinct phases of this process: creation of curated data sets, molecular representations, data-driven property prediction, generation of new molecules, and feasibility and synthesizability considerations. 
    more » « less
  6. null (Ed.)
  7. null (Ed.)
  8. null (Ed.)